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Method of using transcript information to identify and learn commercial portions of a 
program 



BACKGROUND OF THE INVENTION 

Field of the Invention ~~ 

The present invention is directed to a method of and a television viewing 
system for identifying and learning commercials during a program such as a broadcast 
5 television program, and more specifically to identifying and learning commercials during a 
broadcast television program using transcript information. 

Description of the Related Art 

Television viewing systems are available which automatically detect selected 

10 segments of a television signal such as commercial advertisements or undesired portions of 
the program. These commercial detection systems are typically used to mute the audio 
portion of the television broadcast when the undesired portion of the program appears, or for 
controlling a video player to skip the undesired portion of the program during recording or 
replay. Although a wide variety of techniques have been developed for detecting selected 

1 5 segments of television programs, none of the prior art systems monitor the transcript 

information (e.g., closed-captioneci^signal) of a television program to identify and learn the 
commercial portions which occur during the program. In addition, none of the prior art 
systems identify, segment and store individual commercials which occur during a 
commercial segment of the program for later use, for example, to create a library of 

20 commercials to identify corresponding commercial portions of subsequent television 
broadcasts. 



OBJECTS AND SUMMARY OF THE INVENTION 

It is therefore an object of the present invention to provide a method which 
25 identifies and learns commercial portions of a broadcast program. 

It is another object of the present invention to provide a method which 
monitors the transcript information corresponding to a broadcast program to identify and 
learn commercial portions of the broadcast program. 
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It is a further object of the present invention to provide a method which 
identifies, segments and learns individual commercials which are broadcast during a 
commercial segment of a broadcast program by analyzing the transcript information 

associated therewith. 

It is a further object of the present invention to provide a method for 
identifying and learning commercial portions of a broadcast program which overcome 
inherent disadvantages of known commercial detection methods. ~ 

It is a further object of the present invention to provide a television viewing 
system for identifying and learning commercial segments which occur during a program. 

In accordance with one form of the present invention, a method of identifying 
commercial segments during a program includes the steps of using transcript information 
associated with the program, detecting "non-stop" words in the transcript information during 
a first time period which occur more than a predetermined number of times, detecting "non- 
stop" words in the transcript information during a second time period which occur more than 
a predetermined number of times, and comparing the non-stop words detected during the first 
time period and the "non-stop" words detected during the second time period. 

In accordance with another form of the present invention, a method of learning 
and storing commercial segments which occur during a program includes the steps of 
identifying a possible commercial segment which occurs during the program, comparing 
"non-stop" words of the possible commercial segment with "non-stop" words of each of a li 
of probable commercial segments previously identified to determine at least one matching 
probable commercial segment, comparing transcript text of the possible commercial segment 
with transcript text of the at least one matching probable commercial segment, storing the 
transcript text which is common to both the possible commercial segment and the at least one 
matching probable commercial segment, removing the at least one matching stored probable 
commercial segment from the list of probable commercial segments, and adding the at least 
itching probable commercial segment to a list of candidate commercial segments. 

In accordance with another form of the present invention, a method of learning 
and storing commercial segments which occur during a program includes the steps of 
identifying a possible commercial segment which occurs during the program, comparing 
"non-stop" words of the possible commercial segment with "non-stop" words of each of a li 
of candidate commercial segments previously identified to determine at least one matching 
candidate commercial segment, comparing transcript text of the possible commercial 
segment with transcript text of the at least one matching candidate commercial segment, 
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storing the transcript text which is common to both the possible commercial segment and the 
at least one matching candidate commercial segment, removing the at least one matching 
candidate commercial segment from the list of candidate commercial segments, and adding 
the at least one matching candidate commercial segment to a list of found commercial 
5 segments. 

In accordance with another form of the present invention, a method of learning 
and storing commercial segments which occur during a program includes OTe steps of 
identifying a possible commercial segment which occurs during the program, comparing 
"non-stop" words of the possible commercial segment with "non-stop" words of each of a list 

10 of found commercial segments previously identified to determine at least one matching found 
commercial segment, comparing the transcript text of the possible commercial segment with 
transcript text of the at least one matching found commercial segment, storing the transcript 
text which is common to both the possible commercial segment and the at least one matching 
found commercial segment, and incrementing a counter which indicates the frequency of 

1 5 occurrence of the at least one matching found commercial segment. The method also 
includes adding the found commercial segment to a found commercial list. 

In accordance with another form of the present invention, a method of 
retrieving a stored commercial segment includes the steps of identifying at least one non-stop 
word indicative of a commercial segment which is desired, identifying stored commercial 

20 segments which correspond to the identified non-stop word, and outputting the identified 

stored commercial segments which correspond to the identified non-stop words. The method 
further includes marking the identified stored commercial segment as a commercial area. 

In accordance with another form of the present invention, a television viewing 
system which identifies commercial segments during a program comprises 

25 means for receiving transcript information associated with the program; means for detecting 
"non-stop" words in the transcript information during a first time period which occur more 
than a predetermined number of times; means for detecting "non-stop" words in the transcript 
information during a second time period which occur more than a predetermined number of 
times; and means for comparing the "non-stop" words detected during the first time period 

30 and the "non-stop" words detected during the second time period. 

In accordance with another form of the present invention, a television viewing 
system which learns and stores commercial segments which occur during a program 
comprises means for identifying a possible commercial segment which occurs during the 
program; means for comparing "non-stop" words of die possible commercial segment with 
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M non-stop M words of each of a list of probable commercial segments previously identified to 
determine at least one matching probable commercial segment; means for comparing 
transcript text of the possible commercial segment with transcript text of the at least one 
matching probable commercial segment; means for storing the transcript text which is 
5 common to both the possible commercial segment and the at least one matching probable 
commercial segment; means for removing the at least one matching stored probable 
commercial segment from the list of probable commercial segments; and means for adding 
the at least one matching probable commercial segment to at least one of a list of candidate 
commercial segments and a list of found commercial segments. 
1 o The above and other objects, features and advantages of the present invention 

will become readily apparent from the following detailed description thereof, which is to be 
read in connection with the accompanying drawing. 

BRIEF DESCRIPTION OF THE DRAWINGS 
15 Fig. 1 is a flow diagram of the method of using transcript information to 

identify commercial portions of a program in accordance with the present invention; 

Fig. 2 is a flow diagram of the method of using transcript information to 
identify commercial portions of a program in accordance with the present invention, Fig. 2 
being a continuation of Fig. 1; and 

♦ 

20 Fig. 3 is a flow diagram of the method of learning commercial portions of a 

program in accordance with the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

Referring now to the drawings, the method for using transcript information to 

25 identify and learn commercial portions of a program is shown. The term transcript 

information is intended to indicate text, for example, closed-captioned text, which is typically 
provided with a video programDs transmission (audio/data/video) signal and which 
corresponds to the spoken and non-spoken events of the video program or other textual 
source like EPG (electronic programming guide) data. The transcript information can be 

30 obtained from video text or screen text (e.g., by detecting the subtitles of the video) and by 
applying optical character recognition (OCR) on the extracted text such as that disclosed in 
USSN 09/441,943 entitled "Video Stream Classification Symbol Isolation Method and 
System" filed November 17, 1999, and USSN 09/441,949 entitled "Symbol Classification 
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with Shape Features Applied to a Neural Network" filed November 17, 1999, the entire 
disclosures of each of which are incorporated herein by reference. 

If the audio/data/video signal does not include a text portion (i.e., it does not 
include transcript information), transcript information can be generated using techniques such 
5 as speech-to-text conversion (if subtitles exist, subtitle recognition using OCR is employed 
to generate transcript information) as known in the art. The transcript information may also 
be obtained from a third party source, for example, TV Guide via the internet 

The present invention is based on the knowledge that the transcript 
information of a program is capable of being analyzed and searched using known searching 

1 0 techniques such as key-word searching and statistical text indexing and retrieval. Generally, 
the method for commercial segment identification includes analyzing the transcript 
information corresponding to a program (audio, video, data and the like) and determining the 
beginning of a commercial portion of the program (or the aid of a non-commercial portion of 
the program by identifying "going into commercial" cues in the transcript information as 

IS explained in more detail below). Once the beginning of a commercial portion of the program 

■ 

has been identified, the method analyzes the transcript information to separately identify 
individual commercials contained within the identified commercial portion of the program. 
The signatures of individually identified commercials are then compared to previously 
identified signatures (previously stored) of commercial segments, stored as separate entities 
20 in a database, to identify specific commercial portions of the commercial segment. Once the 
commercial segments have been stored in the database, the user can access the database to 

r 

search for a particular commercial. Alternative to the foregoing, any standard commercial 
detection technique based on audio/video characteristics can be used to tentatively determine 
commercial areas, such as those disclosed in USSN 09/417,288 filed October 13, 1999 

25 entitled Automatic Signature-Base Spotting, Learning and Extracting of Commercials and 
Other Video Content by Dimitrova, McGee, and Agnihotri, and USSN 09/123,444 filed July 
28, 1998 entitled Apparatus and Method for Locating a Commercial Disposed Within a 
Video Data Stream by Dimitrova, McGee, Elenbaas, Leyvi, Ramsey and Berkowitz, the 
entire disclosures of which are incorporated by reference. 

30 Referring initially to Fig. 1, a preferred embodiment of the present invention is 

shown. The method includes determining whether EPG data is available for the received 
(audio/data/video) program signal (Step 8). If EPG data is not available (NO in Step 8), the 
method continues with Step 62 (see Fig. 2). If EPG data is available (YES in Step 8), the 
method then determines whether the received program (audio/data/video) signal includes 
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transcript information for the entertainment (non-commercial) portion and the commercial 
(advertising) portion of the program (Step 1 0). If the received program signal does not 
include transcript information for the entertainment and commercial portions, and the 
transcript information is not available from a third party source, the method of the present 
5 invention employs known speech-to-text conversion techniques to provide the necessary 
transcript information. If the program signal includes transcript information for the 
entertainment portion but does not include transcript information for the commercial portions 
of the program (NO in Step 10), and if transcript information is not available from a third 
party source for the commercial portions of the program, the portions of the program which 

10 do not include the transcript information are tagged as non-program areas (i.e., a 

commercial/advertising region) (Step 12). Thai speech-to-text conversion is employed (Step 
14) to generate the necessary transcript information for the non-program areas. 

If the program signal does contain transcript information for the entertainment 
and the commercial portions of the program (Yes in Step 10), the transcript information is 

1 5 extracted from the program signal (Step 16). The EPG data signal is then analyzed to 

determine the type of program (Step 20) (e.g., talk show, news program, etc). Other program 
type determining methods can be employed such as those which analyze the transcript 
information for cues as to the program type such as those disclosed in USSN 09/739,476 filed 
December 18, 2000 entitled Apparatus and Method of Program Classification Using 

20 Observed Cues in the Transcript Information, by Kavitha Devara, and USSN 09/712,68 1 

filed November 14, 2000 entitled Method and Apparatus for the Summarization and Indexing 
of Video Programs Using Transcript Information, by Lalitha Agnihotri, Kavitha Devara and 
Nevenka Dimitrova, the entire disclosures of which are incorporated herein by reference. 

If the EPG data indicates that the program is of the type which would provide 

25 cues in the spoken text as to the occurrence of a commercial (such as a news program or a 
talk show), this fact is noted (Step 22). News programs and talk shows provide cues as to the 
occurrence of commercials (called "going into commercial" cues) with phrases such as "when 
we come back", "still ahead", "after these messages", "after the commercial break", and "up 
next". When these phrases are identified in the transcript information, there is a high degree 

30 of certainty that a commercial segment is soon to follow. If the program is a talk show or 
news program (Yes in Step 22), the transcript information is monitored for the occurrence of 
the commercial cues (Step 24). When a commercial cue is detected, the region is marked as 
the beginning of a commercial segment of the program (Step 26). Thereafter, the transcript 
information is monitored for a first time period (Step 28) for "non-stop" words which occur 



WO 03/02 1954 PCT/IB02/Q3631 

7 

above a predetermined threshold (Step 30). It should be noted that news programs and talk 
shows also provide cues in the text as to a return from a commercial break to regular 
programming when the host of the news program or talk show says things like "welcome 
back". When such a phrase is identified in the transcript information, there is a high degree 
5 of certainty that a commercial segment has ended. 

Non-stop words are words other than "an", "the", "of etc. The inventors have 
recognized that advertisers desire to deliver their message in a very short period of time. We 
can have recognition of brand names/database aids in labeling commercials. This leads to the 
product name, company name and other identifying features being repeated frequently during 
10 a commercial segment If non-stop words (common to a product being advertised) appear 
numerous times during a relatively short time period during the program, this is indicative of 
a commercial. In one embodiment the time period is about 1 5 seconds and the method 
determines whether non-stop words are mentioned more than once during the time period. 

If non-stop words above the predetermined threshold are identified in Step 30 
15 (X > 1 in Step 30), the transcript text is monitored for a second time period (which preferably 
overlaps with the prior time period) and the non-stop words which occur more than the 
predetermined number of times in the second time period are noted (Step 32). If at least one 
non-stop word occurs more than a predetermined number of times (X > 1 in Step 32), then a 
determination is made as to whether the non-stop words of the current time period coincide 
20 with the non-stop words of prior time periods (Step 36). 

If the non-stop words identified in the current time period and the prior time 
period do not coincide (i.e., they do not have at least one common non-stop word) (NO in 
Step 36), then the current and prior time periods are not part of the same commercial segment 
(Step 38) and the start of the current time period is marked as the start of a new commercial 
25 segment (Step 40). Thereafter, the transcript information is monitored for a next time period 
which overlaps with at least the prior time period and the non-stop words which occur more 
than a predetermined number of times above a threshold are noted (Step 42). 

If in Step 42 non-stop words are identified which occur more than a 
predetermined number of times (X > 1 in Step 42), a determination is made as to whether the 
30 non-stop words of the current time period coincide with the non-stop words of prior time 
periods (Step 46). If the non-stop words of the current time period coincide with non-stop 
words of a prior time period (YES in Step 46), then a notation is made that the current time 
period is part of the same commercial as the prior time period (Step 48). Thereafter, a 
determination is made as to whether the current transcript information corresponds to a return 
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to the non-commercial portion of the program (Step 50). If it is determined that the current 
transcript information corresponds to a return to the non-commercial portion of the program 
(YES in Step 50) (e.g., the host of the show says "Welcome back"), the method returns to 
Step 24. However, if it is determined that the current transcript information is not indicative 
5 of a return to the non-commercial portion of the program (NO in Step 50), then the method 
returns to Step 32 to monitor the transcript information for a new time period. 

If in Step 36 it is determined that the non-stop words of the current time period 
coincide with non-stop words of a prior time period (YES in Step 36), then it is determined 
that the prior time period and the current time period are part of the same commercial 

10 segment (Step 52). Thereafter, the transcript information is monitored for a next time period 
which preferably overlaps with at least the prior time period. The non-stop words which 
occur more than a predetermined number of times are noted (Step 54). 

If the non-stop words occur more than a predetermined number of times in the 
current time period (X > 1 in Step 54), a determination is made as to whether the non-stop 

1 5 words of the current time period coincide with the non-stop words of the prior time periods 
(Step 58). If the non-stop words of the current time period do not coincide with the non-stop 
words of any one of the prior time periods (NO in Step 58), then the beginning of the current 
time period is marked as the start of a new commercial segment (Step 60). Thereafter, the 

method returns to Step 32. 

20 If the non-stop words identified in the current time period coincide with the 

non-stop words of one of the prior time periods (YES in Step 58), then a notation is made 
that the current time period is part of the same commercial as the corresponding prior time 
period which has the same non-stop words (Step 62). Then a determination is made as to 
whether the current transcript information is indicative of a return of the non-commercial 

25 portion of the program (Step 50). If it is determined that the current transcript information 
corresponds to a return to the non-commercial portion of the program (YES n Step 50), the 
method returns to Step 24. However, if it is determined that the current transcript 
information is not indicative of a return to the non-commercial portion of the program (NO in 
Step 50), thai the method returns to Step 32. 

30 Returning now to Step 8, if it is determined that EPG data is not available (NO 

in Step 8), then the method continues with Step 63 shown in Fig. 2. Similarly, if a 
determination is made in Step 22 that the current program is not a talk show, news program 
or other program which provides commercial cues to indicate the beginning of a commercial 
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segment of a program (NO in Step 22), then the method continues with Step 63 shown in 
Fig. 2. 

Turning now to Fig. 2, if the beginning of a commercial segment cannot be 
identified by either commercial cues or EPG data, the transcript information for the program 
5 is continually monitored for specific time periods to identify non-stop words that occur. 
Thereafter the number of occurrences of each of the non-stop words which occur in the 
predetermined time period are noted (Step 63). Thereafter, a determination's made as to 
whether the detected non-stop words occur more than a predetermined number of times 
within the time period (Step 64). If non-stop words do not occur more than a predetermined 

10 number of times in the time period (NO in Step 64), the method returns to Step 63 wherein 
the transcript information is monitored for non-stop words. If, however, non-stop words are 
identified in the time period and the non-stop words occur more than a predetermined number 
of times (YES in Step 64), then the portion of the program which corresponds to the time 
period is identified as the beginning of a commercial segment (Step 66). Thereafter, the 

1 5 transcript information is monitored for a next time period which overlaps with the prior time 
period and the non-stop words which occur more than a predetermined number of times are 
noted (Step 68). If individual non-stop words occur in the time period more than a pre- 
determined number of times (X > 1 is Step 68), then a determination is made as to whether 
the non-stop words of the current time period coincide with the non-stop words of a prior 

20 time period (Step 72). 

If the non-stop words identified in the current time period and the non-stop 
words of the prior time period do not coincide (NO in Step 72), then the current and prior 
time periods are not part of the same commercial segment (Step 74) and the start of the 
current time period is marked as the start of a new commercial (Step 76). Thereafter, the 

25 transcript information is monitored for a next time period which overlaps with at least the 
prior time period and the non-stop words which occur more than a predetermined number of 
times above a threshold are noted (Step 78). 

If in Step 78 non-stop words are identified which occur more than a 
predetermined number of times (X > 1 in Step 78), a determination is made as to whether the 

30 non-stop words of the current time period coincide with the non-stop words of prior time 
periods (Step 82). If the non-stop words of the current time period coincide with non-stop 
words of a prior time period (YES in Step 82), then a notation is made that the current time 
period is part of the same commercial as the prior time period (Step 84). Thereafter, a 
determination is made as to whether the current transcript information corresponds to a return 



WO 03/02 1954 PCT/EB02/03631 

10 

to the non-commercial portion of the program (Step 86). If it is determined that the current 
transcript information corresponds to a return to the non-commercial portion of the program 
(YES in Step 86), the method returns to Step 62. However, if it is determined that the current 
transcript information is not indicative of a return to the non-commercial portion of the 
5 program (NO in Step 86), then the method returns to Step 68 to monitor the transcript 
information for a new time period. 

If in Step 72 it is determined that the non-stop words of the current time period 
coincide with non-stop words of a prior time period (YES in Step 72), then it is determined 
that the prior time period and the current time period are part of the same commercial 

10 segment (Step 88). Thereafter, the transcript information is monitored for a next time period 
which preferably overlaps with at least the prior time period and the non-stop words which 
occur more than a predetermined number of times are noted (Step 90). If non-stop words 
occur more than a predetermined number of times in the current time period (X > 1 in Step 
90), a determination is made as to whether the non-stop words of the current time period 

15 coincide with the non-stop words of the prior time periods (Step 94). If the non-stop words 
of the current time period do not coincide with the non-stop words of any one of the prior 
time periods (NO in Step 94), then the start of the current time period is marked as the start of 
a new commercial (Step 98). Thereafter, the method returns to Step 68. If the non-stop 
words identified in the current time period coincide with the non-stop words of the prior time 

20 periods (YES in Step 94), then a notation is made that the current time period is part of the 
same commercial as the prior time period which has the same non-stop words (Step 96). 
Then a determination is made as to whether the current transcript information is indicative of 
a return of the non-commercial portion of the program (Step 86). If it is determined that the 
current transcript information corresponds to a return to the non-commercial portion of the 

25 program (YES in Step 86), the method returns to Step 62. However, if it is determined that 
the current transcript information is not indicative of a return to the non-commercial portion 
of the program (NO in Step 50), then the method returns to Step 68). 

Based upon the above analysis, if non-stop words occur multiple times in a 
given time segment, and the same words occur for example in the next two overlapping time 

30 segments, the method stores the transcript text from the beginning of the first time period to 
the end of the third time segment as a possible commercial. Further, if it so happens that 
certain words occur multiple times in the third time segment and continue to occur until the 
sixth time segment, then the method stores the transcript text from the beginning of third time 
segment to the end of sixth time segment as a next commercial. The next time similar 
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keywords are observed, then a sub-segment matching method can be used (explained below) 
to match the current possible commercial to the two commercials that are stored. This will 
match the overlapping part of one text to the other possible commercial texts. Assuming that 
the current commercial is bounded by different commercials than the prior occurrence of the 
5 same commercial, the next time the commercial appears, only the center portion of both the 
segments match the current commercial. This enables extraneous portions of the commercial 
segments to be removed from the stored commercial and what is left is only the subject 
commercial. This might include only a part of the first time segment, the entire second time 
segment and a part of the third time segment as the actual commercial. 

10 As a result of the present invention, individual commercials of a multi- 

commercial portion of a broadcast program can be identified using transcript information and 
can be separated from each other and individually stored in memory for a variety of uses such 
as identifying individual commercials during a program and searching for a particular type of 
commercial (auto) or a commercial for a particular product (Honda Accord). 

15 Based on analysis of actual broadcast commercials, the inventors have 

determined that if a non-stop word occurs at least three times within a pre-determined time 
period (15 seconds), this is indicative of the occurrence of a commercial. The inventors have 
discovered that it is unlikely that a non-stop word would occur in a non-commercial portion 
of a program more than three times during any 15 second interval. 

20 The following text is the closed-captioned text extracted from the Late-Night 

Show with David Letterman which includes two commercials. 



1367275 
1368707 
1369638 

25 1373975 
1374847 
1426340 
1430736 
1433842 

30 1437276 
1440019 
1442523 
1444426 
1447560 



Id 11 tell you what, ladies and 

gentlemen, when we come back 

weDll be playing here. 

(Cheers and applause) 

(band playing) of using a dandruff shampoo 

Note how isolated it makes people feel. 

Note its unpleasant smell, the absence of rich lather. 

Note its name. Nizoral a-d. 

The worldDs #1 prescribed ingredient for dandruff... 

In non-prescription strength. 

People can stay dandruff free by doing this with nizoral a-d 

only twice a week. 

Only twice a week. What a pity. 



• 
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1449023 


Nizoral a-d; 




1451597 


I see skies of blue 




1507456 


and clouds of white 




1509419 


the bright, blessed day 


5 


1512724 


the dogs say good night 




1515728 


and i think to myself... 




1518432 


Discover estee lauder pleasures 




1520105 


and lauder pleasures for men. 




1521937 


Pleasures to go. For her. 
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1524842 


For him. 




1526674 


Each set free with a purchase 




1527806 


of estee lauder pleasures 




1528947 


of lauder pleasures for men. 




1530450 


...Oh, yeah. 


15 


1532052 






1534155 






1566922 


(Band playing) 




1586770 


»dave: ItDs flue shot friday. 




1587572 


You know, idd like to take a 


20 


1588473 


minute here to mention the... 



The closed-captioning text demonstrates the effectiveness of the invention 
wherein the words "Nizoral", "A-D", "dandruff 1 , and "shampoo" appeared at least three times 
during the first commercial (15 second) segment between time stamps 1374847 and 1449023. 

25 Moreover, the words "lauder" and "pleasures" appeared more than three times in the second 
commercial between time stamps 1451597 and 1528947. This is based on the fact that 
advertisers want to deliver their message in a short period of time and therefore must 
frequently repeat the product name, company and other identifying features of the product to 
the audience to convey the desired message and information in a short period of time. By 

30 detecting the occurrence of these non-stop words in the transcript information in a 

predetermined time period, individual commercials can be detected and separated from each 
other. 

After a commercial portion of a program has been identified, the individual 
commercials within the commercial portion of a broadcast are preferably separated from one 
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another and stored in memory/database for retrieval at a later time, (e.g., so that a user could 
retrieve a car advertisement by searching the memory/database of commercials) within the 
memory/database which stores the individual commercials to present the user with 
commercials which match the userDs requirements. 
5 Turning now to Fig. 3, the method for learning commercials is shown wherein 

the memory/database which stores the identified commercials includes commercial segments 
which are stored in the found commercial list, the candidate commercial list, and the probable 
commercial list 

Initially, a search for a new commercial area is conducted (Step 120). The 

1 0 search for a commercial area may correspond to the methods shown in Figs. 1 and 2 

described above or other known commercial detection methods such as those disclosed in 
USSN 09/123,444 filed July 28, 1998 entitled "Apparatus and Method for Locating a 
Commercial Disposed Within a Video Data Stream", by Nevenka Dimitrova, Thomas 
McGee, Herman Elenbaas, Eugene Leyvi, Carolyn Ramsey and David Berkowitz, the entire 

1 5 disclosure of which is incorporated herein by reference. A determination is then made as to 
whether a new commercial area is detected (Step 122). If a new commercial area is not 
detected (NO in Step 122), then the method returns to Step 120 where the search is continued 
for a new commercial area. However, if a new commercial area is detected (YES in Step 
122), then the non-stop words which occur more than a predetermined number of times 

20 which correspond to the new commercial area are compared with the non-stop words of the 
commercials which are part of the "found" commercial list. The found commercial list 
corresponds to commercials which have been identified more than twice and therefore a high 
degree of certainty exists as to the correctness of the "non-stop" words and transcript text 
which is stored. If a match between the non-stop words of the new commercial area and the 

25 non-stop words of one of the commercials listed in the found commercial list is identified 
(YES in Step 126), then a counter corresponding to the identified commercial is incremented 
to indicate feat this is an active commercial which still appears during broadcast programs 
(Step 128). If the counter is not incremented for a period of time, (e.g., 1 month) then the 
commercial and the corresponding non-stop words and transcript text are purged from 

30 memory because the commercial is not active. Alternatively, the commercial can be retained 
indefinitely in the database. 

If the non-stop words of the new commercial area do not correspond to non- 
stop words of the commercials contained in die list of found commercials (NO in Step 126), 
then a comparison is made between the non-stop words of the new commercial area and the 
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non-stop words of the commercials of the candidate list of commercials (Step 130). If the 
non-stop words of the new commercial area match the non-stop words of at least one of the 
commercials identified in the candidate list (YES in Step 132), then the commercial which 
was identified in the candidate list is deleted from the candidateds list and moved to the 
5 found commercial list along with the corresponding non-stop words and transcript text (Step 
134). It however, the non-stop words of the new commercial area do not match the non-stop 
words of the commercials contained in the candidate list (NO in Step 132)rthen a comparison 
is made between the non-stop words of the new commercial area and the non-stop words 
contained in the probable list of commercials (Step 136). If a match is found between the 
10 non-stop words of the new commercial area and the non-stop words of one of the 

commercials contained in the probable list of commercials (YES in Step 138), then the 
commercial identified from the list of probable commercials is deleted from the probable list 

* 

of commercials and moved to the candidate list of commercials (Step 140). It however, a 
match between non-stop words of the new commercial area and the non-stop words of one of 

15 the commercials contained in the list of probable commercials is not obtained, then the new 
commercial area which includes the identified non-stop words and the transcript text are 
stored in the probable list of commercials. 

In view of the method shown in Fig. 3, whenever a new potential commercial 
area is detected, the non-stop words identified in the transcript information are compared 

20 with the non-stop words from the found list, candidate list, and probable list of commercials 
which were previously identified. If the non-stop words of the new potential commercial do 
not match the non-stop words of the commercials identified in the found list, candidate list, 
or probable list of commercials, then the new potential commercial is added to the probable 
list of commercials. That is, the non-stop words of the new potential commercial and the 

25 actual transcript of a new potential commercial are added to the probable list of commercials. 
However, if some of the non-stop words of the new potential commercial match the non-stop 
words of at least one of the commercials identified in one of the found list, candidate list, or 
probable list of commercials, the transcript text of the new potential commercial and the 
matching commercial from the list of commercials are compared using an approximate 

30 matching technique such as approximate string matching "Shift-Or Algorithm" as described 
at pages 186-192 of the Computer Science and Engineering Handbook, by Allen C. Tucker 
(Editor-in-Chief) 1997, the disclosure of which is incorporated herein by reference. The 
"Shift-Or- Algorithm" accounts for spurious characters (words, phrases, sentences) that may 
be introduced into the text due to multiple sources from where the transcript text is obtained 
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or generated. By using the "Shift-Or-Algorithm" the transcript text which is common to the 
new potential commercial and the commercial identified from the list of commercials is 
retained and the text which is not coincident is ignored Typically the text which is ignored 
occurs at the beginning or end of the actual commercial due to the absence of non-stop words 
5 or because these portions belong to a commercial segment which was adjacent (contiguous) 
with the newly identified commercial segment. 

It is important to note that the above learning procedure is run continuously for 
programs that do not contain "going into commercial clues". 

The present invention is designed to store the transcripts and optionally a 

10 signature along with the commercial in a database. The system may also be coupled to a 
service provider which downloads or provides access to all of the currently airing 
commercials, or a memory/database of current commercials could be coupled to the system 
to provide commercial knowledge at initial start-up of the system. When the user wants to 
retrieve a specific type of advertisement (e.g., a car advertisement), the user can provide 

IS search parameters and a simple string matching will retrieve the desired commercial, 

searching the found list, candidate list and probable list in order. In addition, the transcripts 
of the stored commercials can be used as signatures to identify the advertisement during a 
broadcast program at a later time. The signature can also be used by advertisers to ensure 
that their commercials have been aired. 

20 It should also be mentioned that the time periods for monitoring non-stop 

words can be any desired length. Since commercials are typically only 15 to 30 seconds 
long, it has been found that the time period should be preferably about 1 5 seconds in 
duration. While it is foreseen that the time periods need not overlap, it has been determined 
that overlapping time periods is preferable. In one example the first time period covers the 

25 time from zero seconds to 15 seconds, the second time period covers a time period from 5 
seconds to 20 seconds, a third time period covers the period from 10 seconds to 25 seconds 
and the fourth time period covers a time from 15 seconds to 30 seconds. With this time 
period structure a more definitive indication of a beginning or end of commercial segments 
can be provided. If it is determined that the first, second and third time periods have the 

• . * * » 

30 same non-stop words, then the transcript information for the first, second and third time 
periods are presented for storage together in the database. 

It should be noted that the total number of time periods which can be linked 
together should be set to a limit (of about the equivalent of one or two minutes) so that an 
entire program is not stored due to the repetition of certain words or names. For example, 
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since commercials are rarely over a minute long, no more than 12 overlapping 15 second 
windows as described above should be grouped together as a possible commercial 

It should also be noted that it is foreseen that the present invention could 
provide the user with links related to commercials that are viewed that the user might be 
5 interested in visiting. For example, if a user is viewing a particular car commercial, the user 
can be presented with loan commercials, car insurance commercials and/or car dealerships 
whose commercials are stored in the database. 

It is also foreseen that the apparatus can include a database of commercials 
and brand names. If a specific brand name as identified by the database is mentioned 
10 numerous times within a predetermined period of time, this is indicative of the occurrence of 
a commercial. The database of commercials and commercial names can also aid in labeling a 
commercial as being for a particular product, and to identify how many commercials there 
are in a given commercial segment 

It is also foreseen that commercial segments of a program can be identified by 
1 5 observing the length (i.e., number of words) of each line of closed-captioned text. The 

system could determine a running average of words/line. If the number of words in a specific 
number of lines exceeds the running average, or if the closed-captioned format changes, this 
is indicative of a commercial segment. 

» 

Having described specific embodiments of the invention with reference to the 
20 accompanying drawing, it will be appreciated that the present invention is not limited to 
those precise embodiments and that various changes and modifications can be effected 
therein by one of ordinary skill in the art without departing from the scope or spirit of the 
invention defined by the appended claims. 
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CLAIMS: 



1 . A method of identifying commercial segments during a program comprising 
the steps of: 

a. using transcript information associated with the program; 

b. detecting "non-stop" words in the transcript information during a first time 
period which occur more than a predetermined number of times; 

c. detecting "non-stop" words in the transcript information during a second 
time period which occur more than a predetermined number of times; and 

A comparing the "non-stop" words detected during the first time period and 
the "non-stop" words detected during the second time period. 

2. The method of identifying commercial segments according to claim 1 further 
comprising the steps of: 

detecting "non-stop" words in the transcript information during a third time 
period which occur more than a predetermined number of times, 

wherein if die "non-stop" words detected during the first time period which 
occur more than the predetermined number of times are different from the "non-stop" words 
detected during the second time period which occur more than the predetermined number of 
times, the first time period is indicative of a first commercial segment and the second time 
period is indicative of a second commercial segment; 

wherein if at least one of the "non-stop" words detected during the first time 
period which occur more than the predetermined number of times is the same as at least one 
of the "non-stop" words detected during the second time period which occur more than the 
predetermined number of times, the first time period and second time period are indicative of 
a common commercial segment, 

wherein if the "non-stop" words detected during the third time period which 
occur more than the predetermined number of times are different from the "non-stop" words 
detected during the second time period and the first time period, the third time period is 
indicative of a commercial segment which is not associated with the commercial segment of 
either of the first or second time periods, and 
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wherein if the "non-stop" words detected during the third time period which 
occur more than the predetermined number of times are the same as the "non-stop" words 
detected during at least one of the second time period and the first time period, the third time 
period is indicative of a commercial segment which is associated with the commercial 
5 segment of the corresponding first or second time period. 

3 . The method of identifying commercial segments according torclaim 2 wherein 
the second time period overlaps in time with respect to the first time period, and the third 
time period overlaps in time with respect to at least the second time period. 

10 

4. The method of identifying commercial segments according to claim 1 wherein 
a beginning of a commercial segment is detected if a number of occurrences of "non-stop" 
words during a predetermined time period is at least equal to a predetermined value. 

15 5. The method of identifying commercial segments according to claim 1 further 

comprising the steps of: 

receiving an audio/data/video signal which includes at least one of transcript 
information and electronic programming guide (EPG) data; and 

analyzing the transcript information and the electronic programming guide 
20 (EPG) data to determine a type of program being broadcast and whether the type of program 
being broadcast includes "going into commercial" and "going out of commercial" cues. 

6. The method of identifying commercial segments according to claim 1 further 
comprising the steps of: 

25 receiving an audio/data/video signal which includes at least one of transcript 

information and electronic programming guide (EPG) data; and 

continuously searching the transcript information for an end of a commercial 

segment, 

wherein when a beginning and end of a commercial segment have been 
30 identified, storing at least one of the "non-stop" words and the transcript information 
interposed between Ihe beginning and end of the commercial segment. 

7. A method of learning and storing commercial segments which occur during a 
program comprising the steps of: 
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a. identifying a possible commercial segment which occurs during the 
program; ^ 

b. comparing "non-stop" words of the possible commercial segment with 
"non-stop" words of each of a list of probable commercial segments previously identified to 
determine at least one matching probable commercial segment; 

c. comparing transcript text of the possible commercial segment with 
transcript text of the at least one matching probable commercial segment; ~ 

d. storing the transcript text which is common to both the possible 
commercial segment and the at least one matching probable commercial segment; 

e. removing the at least one matching stored probable commercial segment 
from the list of probable commercial segments; and 

f. adding the at least one matching probable commercial segment to at least 
one of a list of candidate commercial segments and a list of found commercial segments. 

- 

8. The method of learning and storing commercial segments according to claim 7 
wherein if the "non-stop" words of at least one of the probable commercial segments are not 
identified as matching the "non-stop" words of the possible commercial segment, the method 
further comprises the step of: 

at least one of adding the possible commercial segment to the list of probable 
commercial segments and comparing the possible commercial segment to the list of probable 
commercial segments. 

♦ 

9. The method of learning and storing commercial segments according to claim 
7, wherein step a comprises the steps of: 

1. using transcript information associated with the program; 

2. detecting "non-stop" words in the transcript information during a first time 
period which occur more than a predetermined number of times; 

3. detecting "non-stop" words in the transcript information during a second 
time period which occur more than a predetermined number of times; and 

4. comparing the non-stop words detected during the first time period and the 
"non-stop" words detected during the second time period. 

1 0. The method of learning and storing commercial segments according to claim 
9, wherein if the "non-stop" words detected during the first time period which occur more 
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than the predetermined number of times are different from the "non-stop" words detected 
during the second time period which occur more than the predetermined number of times, the 
first time period is indicative of a first commercial segment and the second time period is 
indicative of a second commercial segment; and 
5 wherein if at least one of the "non-stop" words detected during the first time 

period which occur more than the predetermined number of times is the same as at least one 
of the "non-stop" words detected during the second time period which occur-more than the 
predetermined number of times, the first time period and second time period are indicative of 
a common program segment. 

10 

1 1 , The method of learning and storing commercial segments according to claim 

1 0 further comprising the steps of: 

detecting "non-stop" words in the transcript information during a third time 
period which occur more than a predetermined number of times, 

1 5 wherein if the "non-stop" words detected during the third time period which 

occur more than the predetermined number of times are different from the "non-stop" words 
detected during the second time period and the first time period, the third time period is 
indicative of a commercial segment which is not associated with the commercial segment of 
either of the first and second time periods, and 

20 wherein if the "non-stop" words detected during the third time period which 

occur more than the predetermined number of times are the same as the "non-stop" words 
detected during at least one of the second time period and first time period, the third time 
period is indicative of a commercial segment which is associated with the commercial 
segment of either of the corresponding first and second time periods. 

25 

12. A method of learning and storing commercial segments which occur during a 
program comprising the steps of: 

a. identifying a possible commercial segment which occurs during the 

program; 

30 b. comparing "non-stop" words of the possible commercial segment with 

"non-stop" words of each of a list of found commercial segments previously identified to 
determine at least one matching found commercial segment; 

c. comparing the transcript text of the possible commercial segment with 
transcript text of the at least one matching found commercial segment; 
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d. storing the transcript text which is common to both the possible 
commercial segment and the at least one matching found commercial segment; and 

e. incrementing a counter which indicates the frequency of occurrence of the 
at least one matching found commercial segment. 

5 

13. A method of learning and storing commercial segments according to claim 12 
wherein if the "non-stop" words of at least one of the found commercial segments is not 
identified as matching the "non-stop" words of the possible commercial segment, comparing 
the "non-stop" words of the possible commercial segment to "non-stop" words of a list of 

10 candidate commercial segments, and 

wherein if the "non-stop" words of at least one of the stored candidate 
commercial segments is not identified as matching the "non-stop" words of the possible 
commercial segment, adding the possible commercial segment to the list of probable 
commercial segments. 

15 

14. A method of retrieving a stored commercial segment comprising the steps of: 

a. identifying at least one non-stop word indicative of a desired commercial 

segment; 

b. identifying stored commercial segments which correspond to the identified 
20 non-stop word; and 

c. outputting the identified stored commercial segments which correspond to 
the at least one identified non-stop word. 

15. The method of retrieving a stored commercial segment according to claim 14 
25 further comprising the step of marking the identified stored commercial segment as a 

commercial area. 

16. A television viewing system which identifies commercial segments during a 
program comprising: 

30 means for receiving transcript information associated with the program; 

means for detecting "non-stop" words in the transcript information during a 
first time period which occur more than a predetermined number of times; 

means for detecting "non-stop" words in the transcript information during a 
second time period which occur more than a predetermined number of times; and 
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means for comparing the "non-stop" words detected during the first time 
period and the "non-stop" words detected during the second time period. 

17. The television viewing system according to claim 16, 

5 means for detecting "non-stop" words in the transcript information during a 

third time period which occur more than a predetermined number of times, 

wherein if the "non-stop" words detected during the first time-period which 
occur more than the predetermined number of times are different from the "non-stop" words 
detected during the second time period which occur more than the predetermined number of 
10 times, the first time period is indicative of a first commercial segment and the second time 
period is indicative of a second commercial segment; 

wherein if at least one of the "non-stop" words detected during the first time 
period which occur more than the predetermined number of times is the same as at least one 
of the "non-stop" words detected during the second time period which occur more than the 
1 5 predetermined number of times, the first time period and second time period are indicative of 
a common commercial segment; 

wherein if the "non-stop" words detected during the third time period which 
occur more than the predetermined number of times are different from the "non-stop" words 
detected during the second time period and the first time period, the third time period is 
20 indicative of a commercial segment which is not associated with the commercial segment of 
either of the first or second time periods; and 

wherein if the "non-stop" words detected during the third time period which 
occur more than the predetermined number of times are the same as the "non-stop" words 
detected during at least one of the second time period and the first time period, the third time 
25 period is indicative of a commercial segment which is associated with the commercial 
segment of the corresponding first or second time period. 

18. A television viewing system which learns and stores commercial segments 
which occur during a program, comprising: means for identifying a possible commercial 

30 segment which occurs during the program; 

means for comparing "non-stop" words of the possible commercial segment 
with "non-stop" words of each of a list of probable commercial segments previously 
identified to determine at least one matching probable commercial segment; 
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means for comparing transcript text of the possible commercial segment with 

* 

transcript text of the at least one matching probable commercial segment; 

means for storing the transcript text which is common to both the possible 
commercial segment and the at least one matching probable commercial segment; 
5 means for removing the at least one matching stored probable commercial 

segment from the list of probable commercial segments; and 

means for adding the at least one matching probable commercial segment to at 
least one of a list of candidate commercial segments and a list of found commercial 
segments. 

10 

1 9. A television viewing system which learns and stores commercial segments 

which occur during a program, comprising: 

means for identifying a possible commercial segment which occurs during the 

program; 

1 5 means for comparing "non-stop" words of the possible commercial segment 

with "non-stop" words of each of a list of found commercial segments previously identified 
to determine at least one matching found commercial segment; 

means for comparing the transcript text of the possible commercial segment 
with transcript text of the at least one matching found commercial segment; 
20 means for storing the transcript text which is common to both the possible 

commercial segment and the at least one matching found commercial segment; and 

means for incrementing a counter which indicates the frequency of occurrence 
of the at least one matching found commercial segment. 

25 20. A television viewing system which retrieves a stored commercial segment, 

comprising: 

means for identifying at least one non-stop word indicative of a desired 
commercial segment; 

means for identifying stored commercial segments which correspond to the 
30 identified non-stop word; and 

means for outputting the identified stored commercial segments which 
correspond to the at least one identified non-stop word. 
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